Unsupervised Learning of the Morphology of a Natural Language

نویسنده

  • John A. Goldsmith
چکیده

This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our primary tool to determine whether themodiŽcations proposed by the heuristicswill be adopted ornot. The resulting grammar matches well the analysis that would be developed by a human morphologist. In the Žnal section, we discuss the relationship of this style of MDL grammatical analysis to the notion of evaluation metric in early generative grammar.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An algorithm for the unsupervised learning of morphology

This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith 2001 and has been implemented in software that is available for downloading and testing. 1. Scope of this pap...

متن کامل

Natural Language Processing Of Morphology With Linguistically Motivated Applications To German Linking Elements

A survey of the history of the learning of morphological rules is presented. Further investigation is made into the current state of NLP techniques with regards to supervised and unsupervised learning morphology. An analysis of the outstanding problem of “German linking elements” is presented and reviewed. Finally, a proposal is made with the goal of applying current morphological analysis and ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Unsupervised Morphological Relatedness

Assessment of the similarities between texts has been studied for decades from different perspectives and for several purposes. One interesting perspective is the morphology. This article reports the results on a study on the assessment of the morphological relatedness between natural language words. The main idea is to adapt a formal string alignment algorithm namely Needleman-Wunsch’s to acco...

متن کامل

Experiments in Unsupervised Learning of Natural Language

Linguistics has invented and discarded many theories of language, and there are currently many competitors to the basic idea of phrase structure grammars as capturing the syntactic structure of language. Computational Linguistics has proven to be a testing ground for theories and grammars, and is similarly diverse. Moreover recently we have learnt that the similar principles and techniques may ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2001